Biological Knowledge Integration in DNA Microarray Gene Expression Classification Based on Rough Set Theory

نویسندگان

  • Diego Calvo-Dmgz
  • Juan F. Gálvez
  • Daniel Glez-Peña
  • Florentino Fernández Riverola
چکیده

DNA microarrays have contributed to the exponential growth of genetic data from years. This large amount of gene expression data have been used in researches seeking diagnosis of diseases like cancer using classification methods. In turn, explicit biological knowledge about gene functions has also grown tremendously over the last decade. This work integrates explicit biological knowledge in classification process using Rough Set Theory, making it more effective. In addition, the proposed model is able to indicate which part of biological knowledge has been important for classification. The classification process is divided into five steps. Firstly, supergenes are created, which summarize the information of intersections of sets of genes (called basic categories)from biological knowledge using Principal Component Analysis. Then, continuous values of supergenes are discretized using Discriminant Fuzzy Patterns. The third step is to select the most relevant supergenes using the criterion of maximum β-relevance, supported by Rough Set Theory. Then, decision rules are generated using the CAI (Conjuntos Aproximados con Incertidumbre) model, which are the basis of the final classifier. Finally, a classifier is contructed using decision rules generated in the previous step, giving they an order of application based on a score. Based on a set of samples from DNA microarrays and explicit biological knowledge expressed as sets of genes that may or may not be related to the concept that seeks to be classified, the proposed model is evaluated, obtainin successful results compared to famous classification techniques.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integration and Reduction of Microarray Gene Expressions Using an Information Theory Approach

The DNA microarray is an important technique that allows researchers to analyze many gene expression data in parallel. Although the data can be more significant if they come out of separate experiments, one of the most challenging phases in the microarray context is the integration of separate expression level datasets that have gathered through different techniques. In this paper, we prese...

متن کامل

Using Variable Precision Rough Set for Selection and Classification of Biological Knowledge Integrated in DNA Gene Expression

DNA microarrays have contributed to the exponential growth of genomic and experimental data in the last decade. This large amount of gene expression data has been used by researchers seeking diagnosis of diseases like cancer using machine learning methods. In turn, explicit biological knowledge about gene functions has also grown tremendously over the last decade. This work integrates explicit ...

متن کامل

Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine

We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...

متن کامل

Diagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets

With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene se...

متن کامل

Scalable Sequential Rough Parallel Bounded Symmetrical Clustering for Gene Expression Profile Analysis

The study on gene expression profiling of tissues and cells has become a major tool for discovery in medicine. Identification of co-expressed genes and coherent patterns is the central goal in gene expression profiling and the important task in the field of bioinformatics research. Clustering is an important unsupervised learning technique for Gene Expression Profile Analysis. Many conventional...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012